Equilibrium free energies from non-equilibrium metadynamics 
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In this paper we propose a new formalism to map history-dependent metadynamics in a Markovian 
process. We apply this formalism to a model Langevin dynamics and determine the equilibrium 
distribution of a collection of simulations. We demonstrate that the reconstructed free energy is 
an unbiased estimate of the underlying free energy and analytically derive an expression for the 
error. The present results can be applied to other history-dependent stochastic processes such as 
Wang-Landau sampling. 



In recent years increasing attention has been paid to 
the possibility of studying equilibrium thermodynamical 
processes by means of non-equilibrium processes 
0, A major breakthrough in this field is the work of 
Jarzynski [2[ who has demonstrated that it is possible 
to estimate the free energy difference between two states 
as a suitable average of the work done on the system by 
forcing the transition in a finite time. 

More recently, two of us have introduced, on a more 
empirical basis, the metadynamics method [6[ in which 
the free energy as a function of one or more collective 
variables (CVs) is obtained from a non-equilibrium sim- 
ulation. In this method, the dynamics of a system at 
finite temperature is biased by a history-dependent po- 
tential constructed as the sum of Gaussians centered on 
the trajectory of the CVs. After a transient period, the 
free energy dependence on the CVs can be estimated 
as the negative of the bias potential. This method is 
closely related to the local elevation method [7[ , to coarse 
molecular dynamics [1, @| and to the adaptive-force bias 
method [13]. Moreover, as described in Ref. [ll|, meta- 
dynamics can be viewed as a finite temperature extension 
of the Wang-Landau approach [lij] , where the density of 
states of a system is estimated by a Monte Carlo pro- 
cedure in which the acceptance probability of a move 
is modified every time a configuration is explored. The 
practical validity of the metadynamics method has been 
demonstrated in a number of applications to real prob- 
lems @, [[J El EI [H El E3, ES HE El , and an empirical 



way to evaluate the error has been suggested in Ref. [21 1 . 
Attempts at a more formal approach have so far been 
frustrated by the lack of a formalism capable of handling 
a non-Markovian process [22| ■ 

The problem of working with history-dependent dy- 
namics is that the forces (or the transition probabilities) 
on the system depend explicitly on its history. Hence it 
is not a priori clear if, and in which sense, the system can 
reach a stationary state under the action of these dynam- 
ics. In this Letter we introduce a formalism that allows 
us to demonstrate that this is indeed the case, at least 
when the evolution of the system is of the Langevin type. 
We introduce a novel mapping of the history-dependent 
evolution into a Markovian process in the original vari- 



able and in an auxiliary field that keeps track of the con- 
figurations visited. Using this mapping we are able to 
validate rigorously the metadynamics method. In partic- 
ular, we show that the average over several independent 
simulations of the metadynamics biasing potential is ex- 
actly equal to minus the free energy, and we obtain an 
explicit expression for the standard deviation of the sin- 
gle realization from this average. The same formalism 
can be extended to Monte Carlo-like samplings such as 
Wang-Landau and, more generally, to all stochastic pro- 
cesses augmented by an history-dependent term which is 
an explicit function of the system trajectory. 

We will here consider the evolution of the CVs in the 
framework of stochastic differential equations. Dimen- 
sional reduction 23|, |2J] leads in general to a process 
with a complex memory friction and an inertial term. 
However, we have extensively checked [ill, [25| that in 
real systems the quantitative behavior of metadynamics 
is perfectly reproduced by the Langevin equation in its 
strong friction limit. This is due to the fact that in real 
systems all the relaxation times are usually much smaller 
than the typical diffusion time in the CVs space, and 
are therefore averaged out during a metadynamics re- 
construction. Hence, we model the CVs evolution as a 
Langevin- type dynamics. For this dynamics it is possible 
to solve analytically the equilibrium distribution of the 
system. 

Under this assumption the metadynamics equation in 
the CVs s of a system with free energy F(s) becomes 



ds = -D 



F(s)+ dt'g( S ,s(t')) 



s(t) 



dt+V2DdW , 
(1) 

where dW is a Wiener noise, D is the diffusion coefficient 
and we measure the energies in units of temperature. The 
variable s is in general multi-dimensional and d/ds indi- 
cates a vector derivative in the multi-dimensional space of 
the CVs. The second term in square brackets in Eq. Q]is 
the history-dependent potential, generated through the 
kernel g(s, s'). So far g(s, s') has been taken to be a 
Gaussian [fj, |21| in the distance \s — s'\ with a pre- factor 
related to the speed with which we wish to reconstruct 
F(s), but different kernels can be considered. A station- 
ary state can be reached if the system is confined in a 
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region il. The analysis is simplified by considering re- 
flecting conditions at its boundaries dfl and assuming 
that the gradient of the free energy in the direction nor- 
mal to d£l vanishes. Other boundary conditions could 
easily be introduced at the cost of algebraic complica- 
tions. The kernel <?(s, s') is required to satisfy the same 
boundary conditions as F(s) for any fixed value of s' , and 
to be such that the equation 



ds'g(s,s')ip (s')+F(s) = 



(2) 



has a solution for the function <fa(s). 

In order to study the average properties of an ensem- 
ble of independent metadynamics calculations we have 
to transform the stochastic description of Eq. U in a 
probabilistic description. When the stochastic evolu- 
tion is Markovian, this is done using the Fokker-Planck 
equation. However, Eq. [T] contains a history-dependent 
term (the bias potential) and is clearly non-Markovian. 
In order to circumvent this problem we define a time- 
dependent field <p(s; t) 



tp(s;t) 



dtS(s - s(i)) + <p Q (s) 



(3) 



which is made up of two terms: the histogram of the 
positions already visited by the system and a time- 
independent gauge term (po(s) defined by Eq. [2l intro- 
duced to simplify the resulting equations. With this 
choice of the gauge it is implicitly assumed that the initial 
conditions are </?(s;0) = ifo(s). In terms of the variables 



s(t) and <p(s;t) the stochastic process in Eq. [T] can be 
rewritten in the simple form 



ds(t) = —D I ds' 99 ^ ,S '\ (s';t) 



s=s(t) 



dip(s;t) = 5(s - s(t))dt 



dt + V2DdW 

(4a) 
(4b) 



as can be verified by direct substitution. This is 
the crucial step that allows the non-Markovian evo- 
lution of a single dynamic variable s(t) in Eq. Q] to 
be replaced with a Markovian evolution for the ex- 
tended set of variables which includes s(t) and the field 
ip(s;t). In fact, the state of the system at time t + dt, 
[s (t + dt) ,(p (s; t + dt)] depends only on the state of the 
system at time t, [s (t) , tp (s; t)]. The information related 
to the underlying free energy F(s) has disappeared from 
the equation of motion but is still present through the 
initial condition for ip(s;t), see Eq. [3] 

Using the Markovian property it is possible to analyze 
in a rigorous manner the behavior of Eq.|4j In particular, 
by using standard techniques 26] , it is possible to write a 
generalized Fokker-Plank equation and study its asymp- 
totic behavior for large t. We consider an ensemble of 
independent metadynamics runs, and define an ensemble 
density. Since our dynamic variables are the position of 
the walker s and the field y(s), the probability density 
will be a function of s and a functional of ip. We de- 
note this probability as P({(p}, s;t). The Fokker-Planck 
equation for P({(p}, s; t) is 



dP({(p},s;t) SP({cp},s;t) 



dt 



S(p(s) 



-DP({cp},s;t) / ds' 



ds 2 



<p{s')+D 



dP({<p},s;t) 
ds 



, dg(s,s') „ d 2 P({cp},s;t) 

dS ^^ (S )+D d? 

© 



Here, if the dimensionality of the system is higher than 
1, a trace is implied and the second derivative is in fact 
a Laplacian. The probabilistic description in Eq. [5] is 
completely equivalent to the coupled stochastic Eqs. I4"al 
andl4bl 

Equation [5] is our main result and describes the evo- 
lution of an ensemble of metadynamics runs. We would 
like to stress that this result has far more general rele- 
vance than its application to the Langevin model in Eq.[T] 
In fact, our formalism would allow mapping the meta- 
dynamics equations into a Markovian form also before 
performing the dimensional reduction. For example it 
could be applied to the Hamilton equations of motion in 
the canonical coordinates of the system, p and q, aug- 
mented with a Langevin thermostat in order to impose 
the temperature. This would lead to a set of Markovian 



equations in the original coordinates of the system and 
in the field tp(s;t), and to a Fokker-Plank equation in a 
probability P({ip},p, q; t). 

We now look for the limiting distribution of Eq.[5]when 
t — > oo, namely the probability density P which satisfies 
- — ,s,t ^ = 0. Remarkably, this solution is independent 
on s and is 



P(M) = Cexp 



dsds'ip(s) 



d 2 g{s, i 
ds 2 



<P(s') > (6) 



as can be verified by direct substitution. Strictly speak- 
ing not all initial conditions might flow to this solution. 
However, extensive numerical experimentation has shown 
this not to be the case in practical applications. C is 



a normalization constant, and the kernel 



d 2 g(s,s') 



sumed to be symmetric and negative definite. 
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These kernel properties are better discussed through a 
change of basis. The most general form for g with the 
correct properties is 

g(s, s') = ^ ak(s)gkak{s'), with g k > , (7) 



where the afe(s) are the eigenfunctions of the Laplacian 
operator on f2. In the one-dimensional case the label k 
is a positive or null integer and the basis functions are 
a (s) = yiT^and a k (s) = ^2jS cos(^) for k £ 0. For 
a cubic d dimensional domain with side S, the eigenfunc- 
tions can be factorized and the label k is a d dimensional 
vector of positive or null integers. For g k in Eq. [7] we 
can chose the Fourier transform of a general radial func- 
tion. If this is a Gaussian with standard deviation 5s, 
gk <x exp(— - 2 S % S ), where k 2 is the square norm of the 
vector k. It is easily verified that the form in Eq.[7]for the 
kernel is equivalent to adding not only a Gaussian cen- 
tered on the actual value of the CVs, but also reflected 
Gaussians that are positioned at an equal distance on the 
other side of the boundaries. This form of the kernel is 
slightly different from the one introduced in Refs. [rll2lj]. 
but eliminates the systematic errors close to the bound- 
aries that are observed using the simple Gaussians [ll[ 
and produces a reconstructed free energy that is reliable 
everywhere on f2. This has been extensively checked on 
a variety of model systems. 

Equation [6] expresses the probability of obtaining a 
given field ip at the end of a metadynamics simulation. 
Since the negative of the biasing potential is used to esti- 
mate the free energy, we define the error e(s) as the sum 
of the exact underlying free energy and the biasing po- 
tential. Using Equations [2] and [3] we find that the error 
is linearly related to the field tp through 



e(s)= / ds'g(s,s'Ms') 



(8) 



Equation [5] implies that for a specific realization of a 
metadynamics process the probability of finding large er- 
rors in the estimation of the free energy is small. Using 
Eq. O we can explicitly calculate the expected average 
error of a series of runs. Since the distribution is a Gaus- 
sian with respect to tp, the expectation value of this field 
is vanishing. The error e(s) is linear in the field tp(s), and 
consequently also its expectation value is vanishing: 



<£(«)> =0. 



(9) 



Thus, we proved that the average of the biasing potential 
over a series of metadynamics runs provides an unbiased 
estimate for the underlying free energy. 

Using Eq. [5] we can also address the problem of the 
accuracy, determining the expected quadratic deviation 
(e 2 (s)) of a single metadynamics run from the average. 
This expectation value can be easily calculated on the 
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FIG. 1: (a) Ratio between the present expression for the error 
and the formula obtained by fitting results for d = 1 and 2 
in Ref. Since in the d = 3 case the constant C(d) was 

not estimated, we assume here C(3) = C(2). (b) Plot of the 
square root of the sum in Eq. 1121 indicating the dependence of 
the error on 5 s / S for a fixed filling time (see text for details). 



basis of the eigenfunctions of the Laplacian: 



S_2^g k al(s) 
D 



ir 2 k 2 



k^0 



(10) 



The average value of the error in the domain f2 is 



s 2 



9k 



DS d V 7T 2 ^ 2 



(11) 



A formal generalization of these expressions to domains 
of a different shape is straightforward. 

So far the results are quite general and can be used 
to optimize the simulation parameters. Since all the 
metadynamics simulations carried out so far are based 
on a Gaussian kernel, it is useful to specialize our re- 
sults to this case and compare the error estimate in 
Eq. [TT] with the empirical expression given in Ref. 21 1 . 
In order to facilitate the comparison we shall use the 
same conventions as in Ref. 2l|, that is to say rein- 
troducing standard energy units we write in k space 



exp(- 



TT 2 k 2 Ss 2 ■ 

2S 2 



, where the energy w is the 



Gaussian strength and 1/tq is the frequency at which 
the Gaussians are added to the bias. In this case Eq. [TT1 
gives 



PD Tg 



/2tt 



S 



exp(- 



ir 2 k 2 8s 2 
2S 2 



k^0 



r 2l,2 



(12) 



This is to be compared with the empirical expression 
£fit = C(d)y^^-4f , where C(d) is a constant depend- 
ing on the dimensionality, namely 0.5 for d = 1 and 0.3 
for d = 2. Since the dependence on (3, D, S, w and tq 
is identical, we compare here the error as a function of 
5s/ S. In Fig. [Tfa) it can be seen that the empirical ex- 
pression works quite well, in spite of the fact that it was 
fitted on a very small range of Gaussian widths, namely 



4 



5s/S £ [0.003,0.05], and that the total error was aver- 
aged discarding the region near the boundaries. 



stimulating discussions. 



In Ref. [2l| we also noticed that the total simulation 



time required to fill the entire domain is proportional to 
5- (j-) ■ Therefore the sum in Eq. [12] is proportional 
to the square error at fixed simulation time, and is a 
function of the dimensionless ratio Ss/S and of the di- 
mensionality d. As can be observed in Fig. |Tfb), this 
quantity is a decreasing function of the Gaussian width. 
Thus, to optimize the accuracy of a metadynamics cal- 
culation, the width has to be chosen as large as possible, 
the only limit being the resolution needed to describe 
the underlying free energy. The Wang-Landau sampling 
as formulated in Ref. [121 ] can be viewed as a history- 
dependent stochastic sampling in which the kernel is a 
Kronecker delta. The present analysis suggests that the 
use of a smoother kernel might be advantageous. 

As a final remark, we notice that a similar analysis can 
be carried out also in the multiple-walkers extension of 
metadynamics 25], in which N w independent processes 
contribute to the reconstructed free energy. Equation [ibl 
is generalized as 



dip{s;t) = ^2$(s - Si(t))dt 



(13) 



where Si(t) is the trajectory of the walker i. It is straight- 
forward to show that the asymptotic probability distri- 
bution of the system is also in this case independent of 
Si and given by Eq.[5J This confirms the empirical result 
discussed in Ref. [25j that the error does not depend on 
the number of walkers. 

In conclusion, the approach introduced in this Letter 
allows a history-dependent dynamics such as metady- 
namics to be mapped in a Markovian process where the 
estimated free energy is treated as a dynamical variable. 
We have applied this formalism to a Langevin model sys- 
tem. When the proper collective variables of a reaction 
are used, this model is representative of a large class of 
realistic systems. Our approach allows this stochastic 
dynamics to be treated in a probabilistic manner and 
to search for its equilibrium distribution. We were able 
to demonstrate analytically the correctness of metady- 
namics, and we obtained an explicit expression for the 
error in the estimated free energy at the end of a meta- 
dynamics simulation. The present work is a step towards 
the understanding of all the sampling methods based on 
adaptive biases. 
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